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ABSTRACT:  Observations  from  inspection  by  a  ’test’  method  and  a  standard 
method  are  combined  to  provide  estimators  of  population  proportion,  and 


of  probabilities  of  misclassification  for  the  test  method.  Results  of 
Hochberg  and  Tenenbein  [3]  and  of  Albers  and  Veldman  [1]  are  extended  to 


the  case  where  the  standard  method  is  not  perfect,  but  its  misclassification 
probabilities  have  known  values.  Both  moment  and  maximum  likelihood 
estimators  are  considered  and  some  asymptotic  properties  of  the  resulting 
estimators  are  compared.  ^  DT10> 
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1.  INTRODUCTION 


Suppose  we  have  a  large  population,  containing  an  unknown  proportion, 

P,  of  individuals  possessing  a  certain  characteristic,  which  we  will 
call  'nonconformance'.  In  a  random  sample,  of  size  n,  from  this  population, 
the  distribution  of  the  number,  X,  say,  of  nonconforming  individuals 
will  be  binomial  with  parameters  n,P  so  that 

Pr[X=x]  =  (")  Px(l-P)n'x  (x  =  0,1, . . . ,n) . 

We  will  represent  this,  symbolically,  as 

X--'Bin(n,P)  ,  where  > denotes  "is  distributed  as". 

If  the  individuals  in  the  sample  of  size  n-  are  examined  by  an 
imperfect  measuring  device,  which  detects  actual  nonconformance  with 
probability  p,  and  (incorrectly)  'detects'  nonconformance,  when  the 
individual  is  really  not  nonconforming,  with  probability  p',  then  the 
distribution  of  2,  the  number  of  individuals  declared  to  be  nonconforming, 
as  a  result  of  this  inspection,  will  be  binomial  with  parameters  n, 

Pp+  (l-P)p'  .  It  is  clear  that  the  only  parameter  that  can  be  estimated 
from  observations  on  values  of  2  in  independent  samples  is  Pp  +  (l-P)p'  . 

Various  methods  have  been  suggested  for  obtaining  data  from  which 
estimates  of  P,  p  and  p'  can  be  derived  (e.g.  Albers  and  Veldman  [1],  j 

Johnson  and  Kotz  [5]).  Tenenbein  [6]  suggested  additional  inspection 

i 

of  part  of  the  sample  by  a  perfect  measuring  device  (for  which  p=l  and  J 

I 

p'=0)  and  utilizing  the  resultant  data.  This  method  has  been  extended  by 
Hochberg  and  Tenenbein  [3]  to  allow  for  inspection  of  a  further  sample, 
of  size  ng,  say,  by  the  perfect  measuring  device  (S) . 


*7 


In  this  paper,  we  study  problems  arising  in  this  latter  situation  if 
the  ’established*  measuring  device  S  is  not  perfect,  but  has  known  values 
Pg,  p^  for  p,p’  respectively.  For  convenience,  we  will  denote  the  (unknown) 
values  of  p,p*  for  the  measuring  device  under  test  (T)  by  p^,Pj  respectively. 

Pf 

Problems  of  this  kind  arise  when  it  is  desired  to  calibrate  the  new 
device  (T) ,  by  estimating  pT  and  p^.  The  unknown  proportion  (P)  of  NC  units 
plays  the  role  of  a  nuisance  parameter  in  such  problems. 


We  will  also  assume  (when  necessary)  that  Pg>;>Pc;  and 


2.  ANALYSIS  I  (Moment  Estimation) 


As  a  consequence  of  the  inspections  we  have  the  following  sets  of 


observations : 


(i)  ng  using  S  alone,  with  Zg  judged  nonconforming  (NC) , 

(ii)  Op  using  T  alone,  with  Zj  judged  NC, 

(iii)  n  using  both  S  and  T,-  with  results  shown  below: 


\T 

S 

#  NC 

#  not  NC 

#  NC 

Z11 

L10 

#  not  NC 

“01 

“00 

(#  denotes  'number  of'.)  Evidently,  zu  +  Ziq  +  zqi  +  Z00  =  n‘ 

Under  the  assumption  of  random  sampling  from  a  population  of  effectively 
infinite  size,  we  have  that: 

?  Z  1 

‘'ll  ho 

Zg,  ZT  and  Z  =  are  mutually  independent;  (1.1) 

zoi  2oo . 


Bin(ns,  9g)  with  9g  =  pgP  +  p^(l-P)  ; 


(1.2) 


(1.2 


ZT^Bin(nT,eT)  with  0T=pTP  +  pj(l-P)  , 

Also,  assuming  that  the  S  and  T  classifications  are  independent,  given  the 
true  status  of  the  individual , 


f 

,  * 

v* 

Z  '-'Multinomial 

n; 

l-0g~0,j.+4> 

(1.4 

j 

with  <p  =  PSPTP  +  PgPj(l-P) ,  where  ^denotes  "is  distributed  as". 

Recall  that  ps  and  p£  have  known  values,  and  P  is  the  (unknown) 


proportion  of  NC  individuals  in  the  population. 

Also  p  a  (6S-Ps)/(PS-Ps)  (2.1 

Pj  ®  (<P  ~  Pg^x)/ (0g  -  Pg)  -  (2.2 

Pr  =  CPses-4>)/(Ps-8s)  (2.3 

Now,  (ns  +  n)9s  =  Zs  +  Z^Q  +  Bin(ns  +  n,  0g)  (3.: 

(nT  +  n)§T  =  ZT  +  ZQ1  +  Z1X^  Binfxtp  +  n,  0^)  (3.; 

n$  *  Z^  — -  Bin(n,<tO  (3. 


so  that  0g  9j  and  $  (as  defined  in  (5.1) -(3.3))  are  unbiased  estimators 
of  0g,  9j  and  $  respectively. 

Hence  P  »  (pg-p^'^ig-p^)  (4. 

is  an  unbiased  estimator  of  P  .  Although  the  estimators 

Pt  “  (Vps^  1(*'ps  V  (4, 

and  pj.  -  (Pg-es)'1(ps0T-^)  (4. 


are  not  unbiased  estimators  of  p^  and  p^.  respectively,  the  biases  should  not 
be  large  if  sample  sizes  are  adequate  (see  the  example  later  in  this  section) 


The  variance -covariance  matrix  of  the  random  variables  in  (3.1) -(3.2)  is 


(ns+n)0g(l-0g) 

n(4>-0g@y) 

n<t>(l-0s) 

n(4>-6g0y) 

-  (ny+n)0y(l~0y) 

n4>(l-0y) 

n4>(l-0g) 

n$(l-0y) 

mj>(l-0) 

Hence  (cf.  (4.1)) 

var(P)  =  (n^'^Pg-pp'^gd-Qg)  (5. 

and,  using  the  method  of  statistical  differentials  (see,  e.g.  Johnson  and 
Kotz  [4,  Chapter  1,  Section  7.5])  we  obtain,  after  some  algebraic  manip¬ 
ulation;  the  approximate  formula 

var(pT)=pT2P  2(ps-pg)  2Qn  14>(l-<())-2(nT+n)  1pp>(l-ST)  +  (nT+n)  1p^20T(l-0T) jp 

-2(ns+n)"1{<jj(l-0s)-n(nT+n)’1pg(iJ)-es0T)}pT"1  +  (ng+n^Ogd-Og^  (5. 

An  approximate  expression  for  var(p|)  is  obtained  from  (5.2)  by 
replacing  pT  by  Py  and  P  by  (1-P) ,  and  interchanging  pg  and  pg  . 

An  approximate  formula  for  the  bias  of  is 

[  var(L)  cov(0„,  $-p’  0?)  ] 

E[pT]  -  PT  7  Pt  - H - 5 \  (6) 

T[(es-pp2  (0s-ppd-p'  0T)  j 

which,  after  some  reduction,  gives  a  proportional  bias  (i.e.  100 (bias) /py!) 
lOOtnyP^l-Og)  +  nd-ppOgid-OgOy) 


(ns+n)  (ny+n)  (0g-pp  (<j>-p^eT) 

From  (2.1)-(2.3) 

s.-Po  3  (p.-p^p  ;  <o-Pc  9t  s 


and  also  <J>"QS©T  =  P(l-P)  (pg-Pg)  (PT-P-p  , 
the  approximate  proportional  bias  (7)  is 


so 


100{nTp«(l-es^  +n(l-p^es}  (l-PKPp-pf) 
(rig+n)  (nT+n)P2(ps*Pg)  2pT 
which  is  positive  and  (since  p.j.  <  p^)  less  than 


100  G(l-P) 


(ns+n)P2(ps-p^')2 


% 


% 


(7)’ 


(8) 


where 


G  ■  n~+n  ^S>  *  ^  d-Ps^S  ’ 


(9) 


which  lies  between  p^/l-0g)  and  (l-pg)0g 


Example  1.  Using  as  'typical'  values  of  the  probabilities  Pg,  p£  and  P  the  values 
0.9,  0.1  and  0.1  respectively  we  find  that 

G  =  (nT+n)*1(0.082nT  +  0.162n) 

(so  that  G  lies  between  0.082  and  0.162)  and  the  approximate  proportional  bias 
of  pT  is  between  0  and  1406.25  G(ns+n)"H  .  Note  that  the  upper  limit  is 
less  than  227.8  (ns+n)"H,  so  if  ng  +  n>100  the  approximate  proportional  bias 
is  less  than  2.281.  The  next  section  contains  a  numerical  assessment  of 
formula  (5. .2) ,  without  specifying  values  of  p^  and  p| . 
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3.  SONE  NUMERICAL  .APPROXIMATIONS 


Utilizing  the  reasonable  assumption  that  Pg  »  p£  ,  and  neglecting 
terns  in  p£  and  in  the  numerator  of  (5.2)  we  find 

_  .  7  f  2^(i-es)  es(i-es) 

var(pT)  *  p2  P^p-p’)"2  -2—  -  - -2-  -  -2—2-  ■ 

11  s  s  nP;  (nq+n)pT  ng+n 


Taking  ps  =  0.9,  p£  =  0.1  so  that  6S  =  0.8P+Q.land  cj>  =  0.9  p^P+0.1  p^.(l-P)  , 
we  obtain  from  (10) 


var(pT 


p2  r{0.9pTP  +  0. Ipj(l-P) }(l-0.9pTP  -  0.1Pj(l -P} } 

PtO  S - 2 - 2  i 

1  0.64P^  npT 


(0.8P+0.1) (0.9-0. 8P)  -  (1.8PtP  *  0.2p^.(l-P)} (0.9-0.8P)p‘J 


nS  +  n 


Now  taking  P  *  0.1,  we  find 

_  ..  Pt  0 . 09 (P't’+P’t) {1-0.09 (pfp+pl) }  q  Pt 

var(p  )  =  — i - i— 4 - — - .  -1  (11) 

0.0064  n  p  ng+n  p^ 

=  14.-37.5  (pT+p’){l-0.09(pT+p’)}  -IL™  .  (12) 

n  ng+n  ?T 

Since  0. 09(p^,+p|)  (1-0.  09(p^.+p^.) }  <  i  (because  0.09(p^,+P^.)  <  1)  the  right  hand 
side  of  (12)  is  less  than 

(0.0256n)"1  <  39.1  n'1  . 

In  the  next  section  we  will  compare  the  asymptotic  variances  and  covariances 
of  9^,0^,  and  P  with  those  for  maximum  likelihood  estimators  9g,  9^  and  P  of  9g, 
and  P  respectively. 


4.  ANALYSIS  II  (Maximum  Likelihood  Estimators) 
The  likelihood  function  of  Zg,  Zj  and  Z  is 


"S  ^  "  Zg 

ZS  ZT  Z11’Z10’Z01’Z00  S  ''  sJ 


s-  -  '"s'Zs  e^T(i-9T)V'T 


Equating  derivatives  of  the  log -likelihood  to  zero  gives  the  following 

A  A  A 

equations  for  0g,  0^  and  <j>  : 


zs 

ns~zs 

♦ Z;L0 

zoo 

A 

0g4 

i-0g-0T+$ 

ZT 

nT-ZT 

,  Z01 

zoo 

§T 

1-0T 

A  A 

eT-«> 

l-0s-0T+$ 

“11 

Zll) 

L01 

zoo 

-A 

3> 

9g-5 

9t-$ 

A  A*  A 

l-9g-0T+O 

iect  to  0  <  o 

A  A 

<  0S,0T 

<  1  and  3  > 

information  matrix 

is 

ng  n(l*6rp) 

S  T 


9SU-9S)  (6; 

_-<j>)(i-9  -0T+4>)  l-e  -©T+<j> 

(0  -d>)  (i-0_-e  +d>) 

n 

J  o  i  j  1 

Op  n(l-0s) 

O  O  A 

nfl-Og') 

l-0g-0T+4> 

0p.(l-0j)  (0p.-$)  (l-0g-0,p+<jj) 

C©T~d>)  (l-0g-0T+(J)) 

n(l-0T) 

n(l-9s) 

n{0g9T(l-0g-0T)+20g0T4)-(})2} 

(0s-<D)(i-es- 

•9t+(J))  ’  (0T-«)  (i-eg-0^) 

4>(0s-d>)  (0T-d>)  (i-0s-eT+o) 

The  determinant  is 


<J>(9s-<t>.'  (eT-4>) (i-es-eT+<t>)  8seT(i-es)(i-0T) 

with  y  =  93eT(i-0o-0T+<}))  -  4>C4>-0s0t)  . 

And  from  the  asymptotic  variance-covariance  matrix  Y  we  obtain 


|V  |  |°T(i-0T)  <t>  0s-4>  0t-4>  i-es-eT+d>  0T-4>  i-0s-eT+6  6  0g-6 


f  es(l-es)  (yns  1  +  5N -1)/ +  6) 
where  N  =  n^+n^+n  (=  total  number  of  observations)  and 


6  =  ese  (l-e  )(1-0T) 

The  MLE  of  P  is 

P  -  (Ps"Ps)"1(9s'Pg) .  (1 

The  asymptotic  efficiency  of  P  (see  (4.1))  relative  to  P  is  the  same  as  that 
of  Gg  relative  to  which  is 

100(ng+n)(v  ^  +  *?N  *)/(y+*)%  Cl 

Taking  pg=p^,=0.9  ,  p^*p^.=0.1=P,  as  in  Example  1,  and  ng=Up=n  (=  -jN) 
we  find  y  =  0.0184680  and  5  =  0.0653573,  so  (17)  becomes 

100(ns+n)  (0.2203  ns_1 +  0.7797  N'1) 

=  2(0.2203  +  0.2599)  =  96.04%  . 

The  asymptotic  variance  of  the  MLE  $  is 

:  56(1-6)  -  562N‘1{nsGs*1(l-0g) +nT0T‘1(l-0T)^ +(b(0g-(J))(eT-+Hl-ac-eT+e> 


On  the  other  hand,  recalling  that  var(<j>)  ■  n  ^4>(l-0),  we  find  for  the 
numerical  values  of  the  parameters  used  above,  that  the  asymptotic  efficiency 
of  the  moment  estimator  <j>  is 


1 00  x 


0.0653573  x  0.09{0. 91  -  (2/3)  x  Q.Q9  x  (0.18)'1  x  0.82}  +  0.09  x  Q.Q92  x  0.73 
0.09x  0.91(0.0653573  +  0.0184680) 


.  0.0037449  +  0.0005322 

00 


«  62.30% 


The  markedly  lower  asymptotic  efficiency  of  <p  is  associated  with  the  fact 
that  it  does  not  utilize  the  information  on  values  of  8g  and  0T  which  is 
available  from  the  other  (ng+Oj,)  observations.  Some  support  for  this  statement 
comes  from  the  asymptotic  efficiency  of  cf>  if  the  values  of  6g  and  8^  are  known. 
This  is 


100  x 


(8g-4>)  (0T-*)  (1-0s-0t+4>) 


(19) 


With  the  numerical  values  of  0g,  0^  and  <p  which  we  have  been  using  above  this 
would  give  an  asymptotic  efficiency  of  only  35.18%. 


5.  CALCULATION  OF  MAXIMUM  LIKELIHOOD  ESTIMATES 

It  is  not  possible  to  obtain  explicit  solutions  of  (13.1) -(13.3)  for 
9<.,9.p  and  4>,  so  a  numerical  solution  must  be  sought. 

An  EM  algorithm  (see,  e.g.  Dempster  et  al.  [2])  can  be  constructed  in 
the  following  way.  Introduce  (unobserved)  random  variables  Z..rQ1  (Z-.  m ) 

lj  {OJ  1]  {L  J 

(i,j  =  0,l)  representing  the  numbers  of  i,j  decision  combinations  which 
would  have  been  obtained  if  the  ng  (rtp)  individuals  tested  by  S(T)  had  also 
been  tested  by  T (S) .  (Clearly 

Z10(S)  *  Z11(S)  =  ZS  :01(T)  +  “11  (T)  =  “T^ 


10 


If  values  of  these  variables  had  been  observed  the  maximum  likelihood 
estimators  would  have  been 

For  9S:  (ZS  +  Z10(T)  +  Z11(T)  +  Z10  +  Z11)N  1  ; 

For  9t:  (Zg^  +  Zn^  +  ^  +  ZQ1  +  zn)N  1  ; 


^2L*:  (Zii(S)  +  Z11(T)  +  zn)N'  • 


Since 

E^Z10(T)  '  ZT^  =  ^•nT'ZT^  (6S*^  ’  E^-Z11(T)  I ZT^  =  ZT^T  ’ 

E^Z01(S)^S^  =  ^S'^^T'^1'^  ’  E^Z11(S)1ZS^  =  ’ 

the  EM  algorithm  leads  to  iteration  from  9g^V}  to 


(v+i)  N-i  r  CnT-zT) (es(  Zt4>(  5 

=  N  i  Zf,  + 


L*s  ~ IT 


*  TW  *  zio  *  :nj 


[20. 1) 


-  (v+1)  „-ir(nS‘ZS^9TM  ~^)  ^  ^  ^  - 

9-  V  =  N  - r-r -  +  - r- T  +  ^-r  +  “ni  +  ^ 


1  -  9, 


T  “01  11 


[20.2) 


=  N^ff  4 

l4 


ss  1  1 

TUT  v  -n  | 


[20.5) 


Table  1  sets  out  results  of  applying  the  EM  algorithm  to  three 


illustrative  sets  of  values  of  the  n’s  and  Z’s.  In  each  case 
Z,p=10;  Zqq  =  40;  Zq^  =  3.  The  remaining  values  were 


”smnT 


=  n  =  50; 


(HI) 


8 


3 


4 


Table  1:  EM  algorithm  solutions  of  equations  (13. lj -(13.3) 


V 

ft  O) 

es 

(v) 

0T 

fl 

es 

9  M 

°T 

esM 

p  (v) 

0T 

(moment 

timates) 

0.1200 

0.1900 

0.1200 

0.1500 

0.1900 

0.1200 

0.1500 

0.1700 

0.0800 

1 

0.1221 

0.1839 

0.1154 

0.1520 

0.1924 

0.1248 

0.1539 

0.1714 

0 . 08b5 

2 

0.1240 

0.1816 

0.1131 

0.1522 

0.1928 

0.1270 

0.1553 

0.1714 

0.0903 

3 

0.1251 

0.1805 

0.1119 

0.1522 

0.1929 

0.1284 

0.1560 

0.1712 

0.0928 

4 

0.1256 

0.1800 

0.1111 

0.1522 

0.1930 

0.1294 

0.1565 

0.1711 

0.0945 

5 

0.1259 

0.1798 

0.1106 

0.1522 

0.1930 

0.1300 

0.1568 

0.1710 

0.0957 

6 

0.1260 

0.1797 

0.1103 

0.1522 

0.1930 

-0.1305 

0.1570 

0.1709 

0.0965 

FINAL 

0.1261 

0.1796 

0.1098 

0.1523 

0.1931 

0.1315 

0.1574 

0.1707 

0.0984 

The  initial  values  ,  Q^0-1  and  6^  were  the  moment  estimates.  The 

table  shows  the  results  of  the  first  six  iterations  and  the  final  values,  to  four 
decimal  places.  (Speed  of  convergence  can  be  improved ,  of  course  by  using 
modified  values  of  -5^,  and  for  the  (v+l)-th  iteration,  taking 

account  of  trends  in  values.) 

The  maximum  likelihood  estimates  of  P,  p^  and  p^'  are  obtained  by  replacing 
0g,  0.p  and  <t>  in  (2.1) -(2. 5)  by  their  maximum  likelihood  estimates.  We  obtain 
the  following  formulas  (provided  the  values  lie  between  0  and  1) . 

Set  p  PT  pj 

(I)  (0.1261-pg)/ (pg-pg)  (0. 1098-0. I796p£)/(0.l261-p£)  (0 .1796ps*0 .1098) / (pg-0 . 12c 

(II)  (0 . 1323-pg)  /  (PS"P<~)  (0. 1515-0. 1931p£)/(0.1523-p£)  (0.1931ps-0. 1515) /(ps~0 . 152 

(III)  (0.1574-pg)/ (ps*Pg)  (0.0984-0. 1707p£)/(0.1574-p’)  (0 .1707p£-0 .0984)/ (p?-0 . 15" 


In  order  to  satisfy  the  conditions  0  _<  P ,  p^, ,  p,j,  <_  1  we  need 

Ps_>  max(0g,$/§T)  >  min(6g,$/§^)  >  .  These  conditions,  for  sets  (I) -(III),  are 

(I)  ps  >  0.611;  p£<  0.126 

(II)  ps>  0.681;  p£<  0.152 

(III)  ps  >  0.576;  p^<  0.157  . 

[If  the  conditions  are  not  met,  then  appropriate  boundary  values 
(0  if  formula  gives  a  negative  value,  1  if  it  gives  a  value  greater  than  1) 
can  be  used. ] 


6.  CONCLUDING  REMARKS 

The  estimates  of  p^.,p^  and  P  depend  on  the  values  assumed  for  Pg  and  p,l. 

If  these  values  are  incorrect,  biases  will  be  introduced.  The  way  in  which 
the  values  used  for  pg  and  Pg  affect  the  estimates  can  easily  be  appreciated 
from  equations  (2.1)  -  (2.3) .  For  example,  increase  in  either  ps  or  Pg  will 
tend  to  lead  to  negative  bias  in  estimates  of  P  (remembering  that  6g  <  Pg)  . 

In  this  paper  we  have  been  concerned  with  estimation  of  pT,  p^.  (and  also 
P) ,  supposing  Pg,  Pg  known'.  This  has  been  effected  via  estimation  of  the 
parameters  9g,  0T  and  4>  .  The  same  analysis  can  be  used  in  other  circumstances 
For  example,  if  P  (proportion  of  nonconforming  items)  and  pg  are  known,  then 
p,p,  ^  and  pg  can  be  estimated  using  the  relationships 

Ps  =  (es-PsP)(i-P)'1 
4>(1-P)  -  (VPsP)eT 


(Ps‘ 

i^p 

Pc 

\m 


Of  course,  if  P  is  known,  as  well  as  Pg  and  p|,  then  8g  is  known  and 
there  is  no  need  to  take  any  observations  with  S  alone  -  that  is  we  can 
take  ng  =  0 . 
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